12 research outputs found

    Fitting Tree Metrics with Minimum Disagreements

    Get PDF
    In the L? Fitting Tree Metrics problem, we are given all pairwise distances among the elements of a set V and our output is a tree metric on V. The goal is to minimize the number of pairwise distance disagreements between the input and the output. We provide an O(1) approximation for L? Fitting Tree Metrics, which is asymptotically optimal as the problem is APX-Hard. For p ? 1, solutions to the related L_p Fitting Tree Metrics have typically used a reduction to L_p Fitting Constrained Ultrametrics. Even though in FOCS \u2722 Cohen-Addad et al. solved L? Fitting (unconstrained) Ultrametrics within a constant approximation factor, their results did not extend to tree metrics. We identify two possible reasons, and provide simple techniques to circumvent them. Our framework does not modify the algorithm from Cohen-Addad et al. It rather extends any ? approximation for L? Fitting Ultrametrics to a 6? approximation for L? Fitting Tree Metrics in a blackbox fashion

    Longest Common Subsequence on Weighted Sequences

    Get PDF
    We consider the general problem of the Longest Common Subsequence (LCS) on weighted sequences. Weighted sequences are an extension of classical strings, where in each position every letter of the alphabet may occur with some probability. Previous results presented a PTAS and noticed that no FPTAS is possible unless P=NP. In this paper we essentially close the gap between upper and lower bounds by improving both. First of all, we provide an EPTAS for bounded alphabets (which is the most natural case), and prove that there does not exist any EPTAS for unbounded alphabets unless FPT=W[1]. Furthermore, under the Exponential Time Hypothesis, we provide a lower bound which shows that no significantly better PTAS can exist for unbounded alphabets. As a side note, we prove that it is sufficient to work with only one threshold in the general variant of the problem

    Threshold-Based Network Structural Dynamics

    Get PDF
    The interest in dynamic processes on networks is steadily rising in recent years. In this paper, we consider the (α,β)(\alpha,\beta)-Thresholded Network Dynamics ((α,β)(\alpha,\beta)-Dynamics), where αβ\alpha\leq \beta, in which only structural dynamics (dynamics of the network) are allowed, guided by local thresholding rules executed in each node. In particular, in each discrete round tt, each pair of nodes uu and vv that are allowed to communicate by the scheduler, computes a value E(u,v)\mathcal{E}(u,v) (the potential of the pair) as a function of the local structure of the network at round tt around the two nodes. If E(u,v)<α\mathcal{E}(u,v) < \alpha then the link (if it exists) between uu and vv is removed; if αE(u,v)<β\alpha \leq \mathcal{E}(u,v) < \beta then an existing link among uu and vv is maintained; if βE(u,v)\beta \leq \mathcal{E}(u,v) then a link between uu and vv is established if not already present. The microscopic structure of (α,β)(\alpha,\beta)-Dynamics appears to be simple, so that we are able to rigorously argue about it, but still flexible, so that we are able to design meaningful microscopic local rules that give rise to interesting macroscopic behaviors. Our goals are the following: a) to investigate the properties of the (α,β)(\alpha,\beta)-Thresholded Network Dynamics and b) to show that (α,β)(\alpha,\beta)-Dynamics is expressive enough to solve complex problems on networks. Our contribution in these directions is twofold. We rigorously exhibit the claim about the expressiveness of (α,β)(\alpha,\beta)-Dynamics, both by designing a simple protocol that provably computes the kk-core of the network as well as by showing that (α,β)(\alpha,\beta)-Dynamics is in fact Turing-Complete. Second and most important, we construct general tools for proving stabilization that work for a subclass of (α,β)(\alpha,\beta)-Dynamics and prove speed of convergence in a restricted setting.Comment: 29 pages, extension of the Post-print containing all proofs, to appear in SIROCCO 202

    Dynamic Dynamic Time Warping

    Full text link
    The Dynamic Time Warping (DTW) distance is a popular similarity measure for polygonal curves (i.e., sequences of points). It finds many theoretical and practical applications, especially for temporal data, and is known to be a robust, outlier-insensitive alternative to the \frechet distance. For static curves of at most nn points, the DTW distance can be computed in O(n2)O(n^2) time in constant dimension. This tightly matches a SETH-based lower bound, even for curves in R1\mathbb{R}^1. In this work, we study \emph{dynamic} algorithms for the DTW distance. Here, the goal is to design a data structure that can be efficiently updated to accommodate local changes to one or both curves, such as inserting or deleting vertices and, after each operation, reports the updated DTW distance. We give such a data structure with update and query time O(n1.5logn)O(n^{1.5} \log n), where nn is the maximum length of the curves. As our main result, we prove that our data structure is conditionally \emph{optimal}, up to subpolynomial factors. More precisely, we prove that, already for curves in R1\mathbb{R}^1, there is no dynamic algorithm to maintain the DTW distance with update and query time~\makebox{O(n1.5δ)O(n^{1.5 - \delta})} for any constant δ>0\delta > 0, unless the Negative-kk-Clique Hypothesis fails. In fact, we give matching upper and lower bounds for various trade-offs between update and query time, even in cases where the lengths of the curves differ.Comment: To appear at SODA2

    Συσταδοποίηση, ζυγισμένες ακολουθίες, και συντομότερα μονοπάτια

    No full text
    In this thesis we focus on clustering problems where the input is the ideal relationship between all pairs of objects in the final clustering. More particularly, we concern ourselves with the following problems.Ll-fitting tree metrics and ultrametrics: We are given the ideal distance between all pairs of n objects, and the goal is to output a weighted tree (resp. ultrametric) which spans the set of objects and minimizes the sum of pairwise distance errors. Both problems are closely related to evolutionary biology and the reconstruction of the tree of life. In fact, discussions related to the reconstruction of the optimal tree are traced back to Plato and Aristotle (350 BC), in the context of classification. Both problems were known to be APX-Hard and the best known approximation factor was O((logn)(loglogn)) by Ailon and Charikar [FOCS '05]. We design asymptotically optimal constant factor approximations for both problems. Our paper "Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor" appeared in FOCS '21.Constrained Correlation Clustering: For each pair of objects, we are given a preference related to whether the two objects should be in the same cluster or not. Furthermore, we are also given hard constraints for certain pairs. The output clustering must satisfy all hard constraint, and minimize the number of violated preferences.We design a deterministic combinatorial algorithm with a constant approximation factor. A key ingredient in our approach is a novel nearly-optimal pivoting algorithm for Correlation Clustering. This is a deterministic combinatorial algorithm achieving the best approximation factor among all known deterministic combinatorial algorithms for Correlation Clustering, not just pivoting ones. Part of these results have been submitted to ICALP.Apart from clustering, we also study graph and string problems.Multiple Source Shortest Paths in Planar Graphs: Given an embedded planar digraph with positive edge weights and a face f, we are interested in a data structure supporting shortest path queries, where the source is in f. The best known data structure by Klein [SODA '05] requires O(nlogn) time for preprocessing and O(logn) time for queries, where n is the number of nodes. We improve the preprocessing/query time to O(nlog lfl )/O(log lfl ), where lfl is the number of nodes in f. More importantly, our approach is much simpler, requiring only single source shortest path computations and contractions. In contrast, Klein's solution required persistency, dynamic trees, and an interplay between the primal and the dual graph. Our paper "A Simple Algorithm for Multiple Source Shortest Paths in Planar Digraphs" appeared in SOSA '22.Longest Common Subsequence on Weighted Sequences: Weighted sequences generalize the concept of strings, so that in each position we have a probability distribution over the alphabet, rather than a single character. The motivation comes from the inherent uncertainty of the actual methods used for "reading" a DNA sequence. We suggest that the alphabet size is a crucial parameter for this problem, and provide optimal results both in the case of bounded and unbounded alphabets. Furthermore, this is the first work on Weighted Sequences avoiding the Log-Probability model, a simplifying assumption related to exact computations of reals. Our paper "Longest Common Subsequence on Weighted Sequences" received the Best Paper Award at CPM '20.Σε αυτή τη διατριβή εστιάζουμε σε προβλήματα συσταδοποίησης όπου η είσοδος είναι η ιδανική σχέση μεταξύ όλων των ζευγών αντικειμένων στην τελική συσταδοποίηση. Πιο συγκεκριμένα, ασχολούμαστε με τα ακόλουθα προβλήματα.L1-προσαρμογή μετρικών δέντρων και έξτρα-μετρικών: Μας δίνεται η ιδανική απόσταση μεταξύ όλων των ζευγών n αντικειμένων και ο στόχος είναι να εξάγουμε ένα βεβαρυμένο δέντρο (αντίστοιχα έξτρα-μετρική) το οποίο καλύπτει το σύνολο των αντικειμένων και ελαχιστοποιεί το άθροισμα των σφαλμάτων απόστασης ανά ζεύγη. Και τα δύο προβλήματα σχετίζονται στενά με την εξελικτική βιολογία και την ανακατασκευή του δέντρου της ζωής. Μάλιστα εντοπίζουμε συζητήσεις που σχετίζονται με την ανακατασκευή του βέλτιστου δέντρου από τον Πλάτωνα και τον Αριστοτέλη (350 π.Χ.), στο πλαίσιο του προβλήματος της κατάταξης. Και τα δύο προβλήματα ήταν γνωστό ότι είναι APX-Hard και ο καλύτερος γνωστός παράγοντας προσέγγισης ήταν O((logn)(loglogn)) από τους Ailon και Charikar [FOCS '05]. Σχεδιάζουμε ασυμπτωτικά βέλτιστες προσεγγίσεις σταθερού παράγοντα και για τα δύο προβλήματα. Η εργασία μας "Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor" δημοσιεύτηκε στο FOCS '21.Συσταδοποίηση Συσχέτισης με περιορισμούς: Για κάθε ζεύγος αντικειμένων, μας δίνεται μια προτίμηση που σχετίζεται με το αν τα δύο αντικείμενα πρέπει να βρίσκονται στην ίδια συστάδα ή όχι. Επιπλέον, μας δίνονται και περιορισμοί για ορισμένα ζεύγη. Η ομαδοποίηση εξόδου πρέπει να ικανοποιεί όλους τους περιορισμούς και να ελαχιστοποιεί τον αριθμό των παραβιασμένων προτιμήσεων.Σχεδιάζουμε έναν ντετερμινιστικό συνδυαστικό αλγόριθμο με σταθερό παράγοντα προσέγγισης. Βασικό συστατικό της προσέγγισής μας είναι ένας νέος σχεδόν-βέλτιστος αλγόριθμος pivoting για τη Συσταδοποίηση Συσχέτισης. Πρόκειται για έναν ντετερμινιστικό συνδυαστικό αλγόριθμο που επιτυγχάνει τον καλύτερο παράγοντα προσέγγισης μεταξύ όλων των γνωστών ντετερμινιστικών συνδυαστικών αλγορίθμων για την Συσταδοποίηση Συσχέτισης, όχι μόνο των αλγορίθμων τύπου pivoting. Μέρος αυτών των αποτελεσμάτων έχει υποβληθεί στο ICALP.Εκτός από τη συσταδοποίηση, μελετάμε επίσης προβλήματα γραφημάτων και συμβολοσειρών.Πολλαπλών πηγών συντομότερα μονοπάτια σε επίπεδους γράφους: Δεδομένου ενός ενσωματωμένου επίπεδου κατευθυνόμενου γράφου με θετικά βάρη ακμών και μια όψη f, ενδιαφερόμαστε για μια δομή δεδομένων που υποστηρίζει ερωτήματα συντομότερων μονοπατιών, όπου η πηγή βρίσκεται στην f. Η καλύτερη γνωστή δομή δεδομένων από τον Klein [SODA '05] απαιτεί O(nlogn) χρόνο για προεπεξεργασία και O(logn) χρόνο για ερωτήματα, όπου n είναι ο αριθμός των κόμβων. Βελτιώνουμε τον χρόνο προεπεξεργασίας/ερωτημάτων σε O(nlog lfl )/O(log lfl ), όπου lfl είναι ο αριθμός των κόμβων στο f. Το πιο σημαντικό είναι ότι η προσέγγισή μας είναι πολύ απλούστερη, και απαιτεί μόνο υπολογισμούς συντομότερου μονοπατιού μίας πηγής και συστολές. Αντίθετα, η λύση του Klein απαιτούσε μονιμότητα, δυναμικά δέντρα και μια αλληλεπίδραση μεταξύ του πρωταρχικού και του δυαδικού γράφου. Η εργασία μας "A Simple Algorithm for Multiple Source Shortest Paths in Planar Digraphs" δημοσιεύτηκε στο SOSA '22.Μακρύτερη κοινή ακολουθία σε ζυγισμένες ακολουθίες: Οι ζυγισμένες ακολουθίες γενικεύουν την έννοια των συμβολοσειρών, έτσι ώστε σε κάθε θέση να έχουμε μια κατανομή πιθανοτήτων πάνω στο αλφάβητο, αντί για έναν μεμονωμένο χαρακτήρα. Το κίνητρο προέρχεται από την εγγενή αβεβαιότητα των πραγματικών μεθόδων που χρησιμοποιούνται για την "ανάγνωση" μιας ακολουθίας DNA. Προτείνουμε ότι το μέγεθος του αλφαβήτου είναι μια κρίσιμη παράμετρος για το πρόβλημα αυτό και παρέχουμε βέλτιστα αποτελέσματα τόσο στην περίπτωση των περιορισμένων όσο και των μη περιορισμένων αλφαβήτων. Επιπλέον, αυτή είναι η πρώτη εργασία σχετικά με τις ζυγισμένες ακολουθίες που αποφεύγει το μοντέλο Λογαριθμικής-Πιθανότητας, μια απλουστευτική υπόθεση που σχετίζεται με ακριβείς υπολογισμούς πραγματικών αριθμών. Η εργασία μας "Longest Common Subsequence on Weighted Sequences" έλαβε το βραβείο καλύτερης εργασίας στο CPM '20

    Threshold-based network structural dynamics

    No full text
    The interest in dynamic processes on networks is steadily rising in recent years. In this paper, we consider the (α,β)-Threshold Network Dynamics ((α,β)-Dynamics), where α≤β, in which only structural dynamics (edge dynamics of the network) are allowed, guided by local threshold rules executed by each node. In particular, in each discrete round t, each active pair of nodes u and v, computes a value E(u,v) (the potential of the pair) as a function of the local structure of the network at round t around the two nodes. If E(u,v)<α then the link (if it exists) between u and v is removed; if α≤E(u,v)<β then an existing link among u and v is maintained; if β≤E(u,v) then a link between u and v is established if not already present. New nodes cannot be inserted as a result of the protocol, and existing nodes cannot be removed. The microscopic structure of (α,β)-Dynamics appears to be simple, so that we are able to rigorously argue about it, but still flexible, so that we are able to design meaningful microscopic local rules that give rise to interesting macroscopic behaviors. Our goals are the following: a) to investigate the properties of the (α,β)-Threshold Network Dynamics and b) to show that (α,β)-Dynamics is expressive enough to solve complex problems on networks. Our contribution in these directions is twofold. We rigorously exhibit the claim about the expressiveness of (α,β)-Dynamics, both by designing a simple protocol that provably computes the k-core of the network as well as by showing that (α,β)-Dynamics are in fact Turing-Complete. Second and most important, we construct general tools for proving stabilization that work for a subclass of (α,β)-Dynamics and prove speed of convergence in a restricted setting

    Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

    No full text
    We consider the numerical taxonomy problem of fitting a positive distance function D:(S2)R>0{D:{S\choose 2}\rightarrow \mathbb R_{>0}} by a tree metric. We want a tree TT with positive edge weights and including SS among the vertices so that their distances in TT match those in DD. A nice application is in evolutionary biology where the tree TT aims to approximate the branching process leading to the observed distances in DD [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a deterministic polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in SS. The problems are APX-hard, so a constant factor is the best we can hope for in polynomial time. The best previous approximation factor was O((logn)(loglogn))O((\log n)(\log \log n)) by Ailon and Charikar [2005] who wrote "Determining whether an O(1)O(1) approximation can be obtained is a fascinating question".Comment: 46 pages, Accepted to FOCS 2021 (Full version

    A Simple Algorithm for Multiple-Source Shortest Paths in Planar Digraphs

    No full text
    Given an nn-vertex planar embedded digraph GG with non-negative edge weights and a face ff of GG, Klein presented a data structure with O(nlogn)O(n\log n) space and preprocessing time which can answer any query (u,v)(u,v) for the shortest path distance in GG from uu to vv or from vv to uu in O(logn)O(\log n) time, provided uu is on ff. This data structure is a key tool in a number of state-of-the-art algorithms and data structures for planar graphs. Klein's data structure relies on dynamic trees and the persistence technique as well as a highly non-trivial interaction between primal shortest path trees and their duals. The construction of our data structure follows a completely different and in our opinion very simple divide-and-conquer approach that solely relies on Single-Source Shortest Path computations and contractions in the primal graph. Our space and preprocessing time bound is O(nlogf)O(n\log |f|) and query time is O(logf)O(\log |f|) which is an improvement over Klein's data structure when ff has small size.Comment: Paper accepted at SOSA2

    No Repetition:Fast Streaming with Highly Concentrated Hashing

    No full text
    To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of this approach are small space algorithms for estimating the number of distinct elements in a stream, or estimating the set similarity between large sets. Using standard strongly universal hashing to process each element, we get a sketch based estimator where the probability of a too large error is, say, 1/4. By performing rr independent repetitions and taking the median of the estimators, the error probability falls exponentially in rr. However, running rr independent experiments increases the processing time by a factor rr. Here we make the point that if we have a hash function with strong concentration bounds, then we get the same high probability bounds without any need for repetitions. Instead of rr independent sketches, we have a single sketch that is rr times bigger, so the total space is the same. However, we only apply a single hash function, so we save a factor rr in time, and the overall algorithms just get simpler. Fast practical hash functions with strong concentration bounds were recently proposed by Aamand em et al. (to appear in STOC 2020). Using their hashing schemes, the algorithms thus become very fast and practical, suitable for online processing of high volume data streams.Comment: 10 page
    corecore